Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement

نویسندگان

  • Tran Huy Dat
  • Kazuya Takeda
  • Fumitada Itakura
چکیده

This study shows the effectiveness of using gamma distribution in the speech power domain as a more general prior distribution for the model-based speech enhancement approaches. This model is a superset of the conventional Gaussian model of the complex spectrum and provides more accurate prior modeling when the optimal parameters are estimated. We develop a method to adapt the modeled distribution parameters from each actual noisy speech in a frame-by-frame manner. Next, we derive and investigate the minimum mean square error (MMSE) and maximum a posterior probability (MAP) estimations in different domains of speech spectral magnitude, generalized power and its logarithm, using the proposed gamma modeling. Finally, a comparative evaluation of the MAP and MMSE filters is conducted. As the MMSE estimations tend to more complicated using more general prior distributions, the MAP estimations are given in closed-form extractions and therefore are suitable in the implementation. The adaptive estimation of the modeled distribution parameters provides more accurate prior modeling and this is the principal merit of the proposed method and the reason for the better performance. From the experiments, the MAP estimation is recommended due to its high efficiency and low complexity. Among the MAP based systems, the estimation in log-magnitude domain is shown to be the best for the speech recognition as the estimation in power domain is superior for the noise reduction. key words: speech enhancement, speech recognition, gamma modeling, fourth-order moment, MMSE, MAP, spectral magnitude, power, logspectral magnitude

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering

Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...

متن کامل

Noise Power Spectral Density Estimation based on Maximum a Posteriori and Generalized Gamma Distribution

Noise power spectral density (PSD) estimation is a crucial part of speech enhancement system due to its contributory effect on the quality of the noise reduced speech. A novel estimation method for color noise PSD on the basis of an assumption of generalized Gamma distribution and maximum a posteriori (MAP) criterion is proposed. In the experiment, generalized Gamma PDF which is a natural exten...

متن کامل

Speech enhancement based on hidden Markov model using sparse code shrinkage

This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...

متن کامل

Proceedings of Meetings on Acoustics

Estimation of the power spectral density (PSD) of noise is crucial for retrieving speech in a noisy environment. 3 novel methods for estimating the non-white noise PSD of noisy speech based on a generalized gamma distribution and 3 criterions are proposed, which are minimum mean square error (MMSE), maximum a posteriori (MAP) and Maximum likelihood estimation (MLE). Because of the highly non-st...

متن کامل

Multiple statistical models for soft decision in noisy speech enhancement

Most speech enhancement algorithms are based on the assumption that speech and noise are both Gaussian in the discrete cosine transform (DCT) domain. For further enhancement of noisy speech in the DCT domain, we consider multiple statistical distributions (i.e., Gaussian, Laplacian and Gamma) as a set of candidates to model the noise and speech. We first use the goodness-of-fit (GOF) test in or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEICE Transactions

دوره 89-D  شماره 

صفحات  -

تاریخ انتشار 2006